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Abstract 

When users want to establish wireless communication 
between/among their devices, the channel has to be 
bootstrapped first. To prevent any malicious control of 
or eavesdropping over the communication, the chan- 
nel is desired to be authenticated and confidential. The 
process of setting up a secure communication chan- 
nel between two previously unassociated devices is re- 
ferred to as "Secure Device Pairing". When there is 
no prior security context, e.g., shared secrets, com- 
mon key servers or public key certificates, device pair- 
ing requires user involvement into the process. The 
idea usually involves leveraging an auxiliary human- 
perceptible channel to authenticate the data exchanged 
over the insecure wireless channel. 

We observe that the focus of prior research has 
mostly been limited to pairing scenarios where a sin- 
gle user controls both the devices. In this paper, we 
consider more general and emerging "two-user" sce- 
narios, where two different users establish pairing be- 
tween their respective devices. Although a number of 
pairing methods exists in the literature, only a hand- 
ful of those are applicable to the two-user setting. We 
present the first study to identify the methods practi- 
cal for two-user pairing scenarios, and comparatively 
evaluate the usability of these methods. Our results 
identify methods best-suited for users, in terms of ef- 
ficiency, error-tolerance and of course, usability. Our 
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work sheds light on the applicability and usability of 
pairing methods for emerging two-user scenarios, a 
topic largely ignored so far. 

Keywords — Wireless Security, Device Authentication, 
Pairing, Usability 

1 Introduction 

Increasing proliferation of personal gadgets (including 
PDAs, cell-phones, headsets, cameras and media play- 
ers) equipped with wireless communication (e.g., Wi- 
Fi, Bluetooth, WUSB) continuously opens up new ser- 
vices and possibilities for ordinary users. There are 
many usage scenarios where two devices need to "work 
together." In commonly occurring, so called "single- 
user" scenarios, both communicating devices are con- 
trolled by a single user (Alice). Examples include com- 
munication between Alice's Bluetooth headset and her 
cellphone, her PDA and a wireless printer, or her lap- 
top and a wireless access point. On the other hand, 
"two-user" scenarios, whereby two different users (Al- 
ice and Bob) control their respective devices, are also 
fast emerging. Examples include communication be- 
tween Alice's and Bob's PDAs/laptops/cell phones for 
sharing files, exchanging digital business cards, multi- 
player games, messaging, chatting or collaborative ap- 
plications. 

The surge in popularity of wireless devices, how- 
ever, brings about various security risks. The wireless 



1 



communication channel is easy to eavesdrop upon and 
to manipulate, raising the very real threats, notably, of 
so-called Man-in-the -Middle (MiTM) or Evil Twin at- 
tacks. To mitigate these attacks, secure communica- 
tion must be first bootstrapped, i.e., devices must be 
securely pairecQor initialized. 

One of the main challenges in secure device pair- 
ing is that, due to sheer diversity of devices and lack 
of standards, no global security infrastructure exists 
today and none is likely for the foreseeable future. 
Consequently, traditional cryptographic means (such 
as authenticated key exchange protocols) are unsuit- 
able, since unfamiliar devices have no prior security 
context and no common point of trust. Moreover, the 
use of a common wireless channel is insufficient to es- 
tablish a secure context, since such channels are not 
perceivable by the user. 

One valuable and established research direction is 
the use of auxiliary - also referred to as "out-of-band" 
(OOB) - channels, which are both perceivable and 
manageable by the human user(s) who own and op- 
erate the devices. An OOB channel takes advantage 
of human sensory capabilities to authenticate human- 
imperceptible (and hence subject to MiTM attacks) in- 
formation exchanged over the wireless channel. OOB 
channels can be realized using acoustic, visual and tac- 
tile senses. Unlike the main (usually wireless) channel, 
the attacker can not remain undetected if it actively in- 
terferes with the OOB channel. 

For pairing methods based on OOB channels, some 
degree of human involvement is essential for achiev- 
ing security and thus making the user interaction error- 
free is extremely important. We observe that a large 
majority of existing device pairing methods (which we 
review in the following section) are proposed by secu- 
rity professionals without much expertise in usability, 
leaving the security and performance of the resulting 
methods unknown at best. Even for the few methods 
that have been tested for usability, the testing is done 
in isolation (without facilitating any fair comparison 
among the methods) or with a limited focus on only 
single-user scenarios. 

Motivation: The application domain for secure pair- 
ing methods is not just limited to single-user scenarios. 

1 We use the term "pairing" to refer to the bootstrapping of se- 
cure communication between two devices communicating over a 
wireless channel 



Two users often want to exchange files, digital busi- 
ness cards, etc., with each other, when they meet up in 
person. Main advantage of using Bluetooth or WiFi in 
such scenarios is that no infrastructure is needed and 
thus ad hoc communication can take place without any 
extra cost to the users. For this reason, two-user scenar- 
ios have been emerging rapidly and are already quite 
popular, especially in developing countries. Secure 
pairing of users' devices is a natural and recommended 
way to prevent any eavesdropping and/or malicious in- 
terception during their intended communication. 

Many single-user pairing methods have been pro- 
posed, each having certain claimed advantages and 
shortcomings. A single-user pairing method could be 
directly used in a two-user scenario, only if one of the 
users operates both the devices. However, it might 
not always be desirable or feasible to designate both 
devices to one of the users, e.g., due to security and 
privacy reasons. In fact, as the survey results of our 
study show, a majority of users understand such secu- 
rity and privacy implications and would not be will- 
ing to share their devices with others, even temporarily 
during the pairing process. Therefore, it is not clear if 
existing single-user pairing methods are suitable (and 
to what extent) when applied in a two-user scenario. 
In fact, participation of two human users makes the se- 
cure pairing process more complicated and potentially 
error pronej^] Thus, not all methods might extend well 
if each device is controlled by a different user. 

In short, there is a pressing need to evaluate the ap- 
plicability and to compare the performance as well as 
the usability of existing pairing methods for a two-user 
setting. Such a study is essential to identify pairing 
method(s) most suitable for everyday users, in terms of 
efficiency, error-tolerance and of course, usability. 

Our Contributions: We overview prominent device 
pairing methods, and identify which of these methods 
or variants thereof are feasible to use in a two-user set- 
ting. We implement the selected methods using a com- 
mon software platform and conduct a comprehensive 
and comparative field study, focusing on both usability 
and security in a two-user setting. Our work helps an- 
swer the following question: what pairing method(s) 



'On the other hand, unlike the single-user setting, the devices 
taking part in the two-user setting are not usually constrained in 
terms of input/output interfaces. This simplifies the two-user pair- 
ing process to a certain extent. 
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should be deployed in practice for two-user scenar- 
ios? Without a thorough study, it would be hard to 
answer this question, simply based on intuition or prior 
test results on single-user pairing scenarios. Our study 
yields some interesting results which help us identify 
most appropriate method(s). Although this paper is 
less technical in nature than traditional security and ap- 
plied cryptography research, we believe that the topic 
is very important and timely since it sheds light on us- 
ability in one of the emerging settings where most users 
(not just specialists) are confronted with security tech- 
niques. Also, since most device pairing methods are 
developed by highly-skilled specialists who are clearly 
not representative of the general population, there is a 
certain gap between what seems to be, and what really 
is, usable. We hope that our work will help narrow this 
gap- 
Scope: The scope of this paper is limited only to two- 
user scenarios. A comprehensive and comparative us- 
ability evaluation of pairing methods for single-user 
scenarios is of independent interest, which has been 
addressed in various recent work lfl2l[T0l l8ll. 

Paper Organization: The rest of the paper is orga- 
nized as follows: Section [2] reviews notable crypto- 
graphic protocols and relevant device pairing meth- 
ods. Next, Section [3] discusses some pre-study design 
choices and criteria, followed by usability testing de- 
tails in Section [4] Next, we discuss, in Section [5] our 
interpretations of the results obtained in the course of 
the study and conclude with the summary and future 
work in Section [6] 



2 Background 

In this section, we describe notable relevant crypto- 
graphic protocols and device pairing methods. The 
term cryptographic protocol denotes the entire inter- 
action involved, and information exchanged, in the 
course of device pairing. The term pairing method 
refers to the pairing process as viewed by the user, i.e., 
the sequence of user actions. As discussed later on, 
a single cryptographic protocol can be coupled with 
many pairing methods. 



2.1 Cryptographic Protocols 

One simple protocol was suggested in H), where de- 
vices A and B exchange their respective public keys 
pkA, pks over the insecure channel and the corre- 
sponding hashes H(pkA) and H(pks) - over the OOB 
channel. Although non-interactive, the protocol re- 
quires H() to be a (weakly) collision-resistant hash 
function and thus needs at least 80 bits of OOB data 
in each direction. MANA protocols [6] reduce the size 
of OOB messages to k bits while limiting attacker's 
success probability to 2~ fe . However, these protocols 
require a stronger assumption on the OOB channel: the 
adversary is assumed to be incapable of delaying or re- 
playing any OOB messages. 

In [29], the author presented the first protocol based 
on Short Authenticated Strings (SAS), which limits at- 
tack probability to 2~ fc for a fc-bit OOB channels, even 
when the adversary can delay/replay OOB messages. 
This protocol utilizes commitment schemes (which can 
be based upon hash functions such as SHA-1, MD5) 
and requires 4-round of communication over the wire- 
less channel. Subsequent work ([14] and Ell) devel- 
oped 3-round SAS protocols. Both are modifications 
of ll29l and both use (although in different way) a uni- 
versal hash function in the computation of SAS mes- 
sages. Recently, authors of EDI proposed a more ef- 
ficient SAS protocol requiring a A; -bit SAS message 
in one direction, but only a 1-bit SAS message in the 
other. In this protocol, device A sends its A: -bit SAS 
message to device B; B checks whether the received 
copy matches its own, and communicates the result to 
A with a 1-bit SAS message. As discussed later, this 
protocol is utilized in a number of pairing methods we 
tested. 

2.2 Device Pairing Methods 

Over the recent year, a number of pairing methods have 
been proposed. They operate over different OOB chan- 
nels, use different cryptographic protocols and offer 
varying degrees of usability. Recall that these methods 
have so far been proposed in the context of a single- 
user setting, and not the two-user setting, which is the 
focus of this paper. Later in Section [3} we will dis- 
cuss the applicability of these methods to the two-user 
setting. 

The initial attempt to address the device pairing 
problem in the presence of MiTM attacks was "Resur- 
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recting Duckling" [26]. It requires standardized phys- 
ical interfaces and cables. Although it was appropriate 
in the 1990-s, this is clearly obsolete today, due to the 
greatly increased diversity (and decreased size) of de- 
vices and the requirement of a physical equipment (i.e., 
a cable) which defeats the purpose and convenience of 
wireless connections. 

"Talking to Strangers" [l] was another early method, 
which relies on infrared (IR) communication as the 
OOB channel and requires almost no user involvement, 
except for initial setup. Moreover, it has been experi- 
mented with user (unlike many other methods), as re- 
ported in [2]. However, this method is deceptively sim- 
ple since IR is line-of-sight and, setting it up requires 
the user to find IR ports on both devices - not a trivial 
task for many users - and align them. Also, despite its 
line-of-sight property, IR is not completely immune to 
MiTM attacks. Another drawback is that IR has been 
largely displaced by other wireless technologies (e.g., 
Bluetooth) and is available on few modern devices. 

Another early approach involves image comparison. 
It encodes the OOB data into images and asks the user 
to compare them on two devices. Prominent examples 
include "Snowflake" Q, "Random Arts Visual Hash" 
1221 and "Colorful Flag" 0. Such methods, however, 
require both devices to have displays with sufficiently 
high resolution. Applicability is therefore limited to 
high-end devices, such as: laptops, PDAs and certain 
cell phones. These methods are based on the protocol 
proposed in |T| which was reviewed earlier. A more 
practical approach, based on SAS protocols I2T1 H4l . 
suitable for simpler displays and LEDs has been inves- 
tigated in Il28l . 

More recent work |[T9l proposed the "Seeing-is- 
Believing" (SiB) pairing method. In its simplest in- 
stantiation, SiB requires a uni-directional visual OOB 
channel for one-way authentication: one device en- 
codes OOB data into a two-dimensional barcode which 
it displays on its screen and the other device "reads it" 
using a photo camera, operated by the user. At a min- 
imum, SiB requires one device to have a camera and 
the other - a display for uni-directional authentication 
and both devices to have a camera and display for bi- 
directional authentication. Thus, it is not suitable for 
small or low-end devices^ From the user's perspec- 

3 Albeit, the display requirement can be relaxed in case of a 
printer; alternatively, a camera-equipped device can snap a photo 
of a barcode sticker affixed to the dumber device. 



tive, SiB is a relatively undemanding pairing method 
as user actions amount to taking a photo of a barcode. 

A related approach, called "Blinking Lights" has 
been explored in GUI . Like SiB, it uses the visual OOB 
channel and requires one device to have a continuous 
visual receiver, e.g., a light detector or a video cam- 
era. The other device must have at least one LED. The 
LED-equipped device transmits OOB data via blinking 
while the other receives it by recording the transmis- 
sion and extracting information based on inter-blink 
gaps. As in SiB, the receiver device indicates suc- 
cess/failure to the user who, in turn, informs the other 
to accept or abort. 

Quite recently, ll23ll developed a pairing method 
based on synchronized audio-visual patterns. Three 
proposed methods, "Blink-Blink", "Beep-Beep" and 
"Beep-Blink", involve users comparing very simple 
audiovisual patterns, e.g., in the form of "beeping" 
and "blinking", transmitted as simultaneous streams, 
forming two synchronized channels. One advantage of 
these methods is that they require devices to only have 
two LEDs (one of which is to ensure synchronization) 
or a basic speaker. 

Another recent method is "Loud-and-Clear" (L&C) 
lfT6ll . It uses the audio (acoustic) OOB channel along 
with vocalized MadLib sentences or phrases which 
represent the digest of information exchanged over 
the main wireless channel. There are two L&C vari- 
ants: "Phrase-DS" and "Phrase-SS". In the latter the 
user compares two vocalized sentences and in the for- 
mer - displayed sentence with its vocalized counter- 
part. Minimal device requirements include a speaker 
(or audio-out port) on one device and a speaker or a 
display on the other. The user is required to com- 
pare the two respective (vocalized and/or displayed) 
MadLib sentences and either accept or abort the pairing 
based on the outcome of the comparison. As described 
in lfT6l . L&C is based on the protocol of Q. In this 
paper, to reduce the number of words in the MadLib 
sentences, we use the L&C variant based on SAS pro- 
tocols ED (Hi- The third variant of L&C, "Phrase- 
DD," simply involves displaying the sentences on two 
devices, which the user is asked to compare. 

Some follow-on work (HAPADEP ll25ll ) considered 
pairing devices that - at least at pairing time - have 
no common wireless channel. HAPADEP uses pure 
audio to transmit cryptographic protocol messages and 
requires the user to merely monitor device interaction 
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for any extraneous sounds or interference. It requires 
both devices to have speakers and microphones. To 
appeal to more common setting (one where a common 
wireless channel is available), we employ a HAPADEP 
variant, we call "Over-Audio." This variant uses the 
wireless channel for cryptographic protocol messages 
and the audio - as the OOB channel. In it, only one 
device needs a speaker and the other - a microphone. 
Also, the user is not involved in any comparisons. 

An experimental investigation ETl presented the re- 
sults of a comparative usability study of simple pairing 
methods for devices with displays capable of show- 
ing a few (4-8) decimal digits of OOB data. In 
the "Compare-Confirm" or "Numeric-Compare" ap- 
proach, the user simply compares two 4-, 6- or 8- 
digit numbers displayed by devices. In the "Select- 
Confirm" approach, one device displays to the user a 
set of (4-, 6- or 8-digit) numbers, the user selects the 
one that matches a single such number displayed by 
the other device. In the "Copy-Confirm" approach, the 
user copies a number from one device to the other. The 
last variant is "Choose-Enter" which asks the user to 
pick a "random" 4-to-8-digit number and enter it into 
both devices. All of these methods are undoubtedly 
simple, however, as G71l indicates, Select-Confirm and 
Copy-Confirm are slow and error-prone. Furthermore, 
"Choose-Enter" is insecure since studies show that the 
quality of numbers (in terms of randomness) picked by 
the average user is very low. 

Yet another approach: "Button-Enabled Device Au- 
thentication" [24] suggests pairing devices with the 
help of user button presses, thus utilizing the tac- 
tile OOB channel. This method has several variants: 
"BEDA-Blink", "BEDA-Beep", "BEDA- Vibrate" and 
"BEDA-Buttons". In the first three variants, respec- 
tively, based on the SAS protocol variant [20], the 
sending device blinks its LED (or beeps or vibrates) 
and the user synchronously presses a button on the re- 
ceiving device. Each 3-bit block of the SAS string is 
encoded as the delay between consecutive blinks (or 
beeps or vibrations). As the sending device blinks (or 
beeps or vibrates), the user presses the button on the 
other device thereby transmitting the SAS from one 
device to another. In the BEDA-Buttons variant, which 
can work with any PAKE protocol (e.g., [0) the user si- 
multaneously presses buttons on both devices and ran- 
dom user-controlled inter-button pressing delays are 
used as a means of establishing a common secret. 



A very different OOB channel was considered in 
"Smart-Its-Friends" lTT3Tl : a common movement pat- 
tern is used to communicate a shared secret to both 
devices as they are shaken together by the user. A simi- 
lar approach is taken in "Shake Well Before Use" iPTTI . 
Both techniques require devices to be equipped with 
2-axis accelerometers. Although some recent mobile 
phones (e.g., iPhone) are equipped with it, accelerom- 
eters are not common on other devices. 

There are also other methods involving technolo- 
gies that are more relatively expensive and uncommon. 
To summarize a few. [9] suggested using ultrasound 
as the OOB channel. A related technique uses laser 
as the OOB and requires each device to have a laser 
transceiver fl8l . However, the hardware needed for 
these methods are not readily available in many current 
devices and are not expected to be deployed soon. 

3 Study Preliminaries 

This section discusses selection criteria for tested 
methods and devices as well as the architecture of our 
software platform. 

3.1 Methods to be Tested 

As follows from our overview in the previous sec- 
tion, there is a large body of prior research on secure 
device pairing, comprised of a wide range of meth- 
ods. As mentioned earlier, all of these methods were 
proposed in the context of a single-user pairing set- 
ting. Some of these methods have been evaluated 
in terms of their usability. These include Talking-to- 
Strangers (Network-in-a-Box) [2], Compare-Confirm, 
Copy-Confirm and Choose-Enter ETll . as well as all 
four variants of BEDA ll24l and Blink-Blink, Beep- 
Beep and Beep-Blink combinations [23]. 

There are more than twenty methods, counting vari- 
ations, in literature. However, some of these meth- 
ods have very limited use cases due to either requir- 
ing both devices to be controlled by the same user dur- 
ing the pairing (e.g., accelerometer based methods such 
as El) or requiring hardware that is not ubiquitous 
among wireless enabled devices. Some methods also 
have stronger assumptions about the OOB channel and 
require it to be confidential (e.g., BEDA-Buttons vari- 
ant of [24]). Notice that secret OOB channels are hard 
to achieve in real-life since a close-by attacker can eas- 



5 



ily eavesdrop on any human perceptible channel (e.g., 
by shoulder surfing the pairing process). 

We believe that it is very difficult to test all available 
methods in one single study and hope that the results 
will yield meaningful comparative usability metrics. 
Obstacles like varying security assumptions about the 
OOB channel among different methods and possible 
user fatigue from including too many methods would 
undermine the results of such study. Consequently, we 
have to cull the number of methods down to a more 
manageable number, eliminating those that are obso- 
lete, deprecated based on prior evaluations or unreal- 
istic on their OOB assumptions. Of course, we also 
eliminated any methods that are limited to a single- user 
setting only. The following methods are excluded from 
our study: 

• Resurrecting-Duckling[26]: obsolete due to phys- 
ical equipment, i.e., cable, requirement. 

• Talking-to-Strangers[l]: obsolete since IR ports 
are not secure against MiTM attacks and they 
have become uncommon. 

• Choose-and-EnterEH, BEDA-Buttons |24): re- 
quires a secret OOB channel and performed 
poorly in prior evaluations. 

• Beep-Beep ll23l : performed poorly in prior evalu- 
ations due to user annoyance and high error rate. 

• Blink-Bhnk|f23l, Image Comparison|[7l|22l|4l: do 
not extend well to a two-user setting, as the two 
devices need to be placed adjacent to each other or 
temporarily exchanged between users; also, vary- 
ing resolutions on devices makes image compari- 
son a burdensome and error-prone task. 

• Seeing-is-Believing |[T9l , Blinking Lights GUI : re- 
quire photo or video cameras on devices and do 
not extend well to the two-user setting due to 
the need for close proximity between the devices; 
also cameras are not ubiquitous interfaces except 
for mobile phones. 

• B EDA- Vibrate EH : vibration is not a common in- 
terface, except for mobile phones; also it is hard 
for one user to sense the vibration on another 
user's device, making this method unusable in a 
two-user setting. 



• Smart-its-Friends lfT3ll . Shake -Well-Bef ore- 
Use H71: requires one user to hold and control 
both devices and thus do not extend to two-user 
settings. 

• Ultrasound and laser |[T8l based methods: re- 
quires interface and hardware capabilities that are 
not common across devices. 

Remaining methods have been included in our two- 
user study. However, we had to slightly modify or up- 
date certain methods to standardize the OOB assump- 
tions and security among the tested methods. In par- 
ticular, we have updated all methods to be based on 
a SAS protocol for better efficiency and unified se- 
curity assumptions. This update resulted in a slightly 
changed user interaction in BEDA-Beep, BEDA-Blink 
and Over- Audio methods from their original proposals. 
A brief description of user interactions involved when 
Alice is pairing her device A with Bob's device B in 
the methods we tested is as follows. 

• BEDA-Beep: As and when A beeps, Bob presses 
a button on B synchronously; B indicates the re- 
sult of pairing on its screen; Bob indicates the 
same result to Alice; Alice accepts or rejects pair- 
ing on A accordingly. The time intervals between 
successive beeps encodes the 15-bit SAS value. 

• BEDA-Blink: As and when A blinks/flashes its 
screen, Bob presses a button on B synchronously; 
B indicates the result of pairing on its screen; Bob 
indicates the same result to Alice; Alice accepts or 
rejects pairing on A accordingly. The time inter- 
vals between successive blinks encodes the 15-bit 
SAS value. 

• Beep-Blink: While A is blinking/flashing its 
screen and B is beeping, Alice compares the 
blinking/flashing of A with the beeping of B and 
Bob compares the beeping of B with the blink- 
ing/flashing of A; based on the comparison, both 
Alice and Bob accept or reject the pairing on A 
and B, respectively. The on-off blinking/beeping 
encodes a 15-bit string. 

• Over-Audio: A plays an audio that has 15-bit 
string encoded into it and B records this audio; 
B indicates the result of pairing on its screen; Bob 
indicates the same result to Alice; Alice accepts 
or rejects pairing on A accordingly. 



6 



• Numeric-Compare: A and B display a 5 -digit 
number (each) on their respective screens; Alice 
compares the number displayed on A with the 
number displayed on B (with Bob's help); Bob 
compares the number displayed on B with the 
number displayed on A (with Alice's help); based 
on the comparison, both Alice and Bob accept or 
reject the pairing on A and B, respectively. 

• Phrase-DD: A and B display 3- word phrases 
on their respective screens; Alice compares the 
phrase displayed on A with the phrase displayed 
on B (with the help of Bob); Bob compares the 
phrase displayed on B with the phrase displayed 
on A (with the help of Alice); based on the com- 
parison, both Alice and Bob accept or reject the 
pairing on A and B, respectively. 

• Phrase-DS: A displays a 3-word phrase on its 
screen and B "speaks-out" a 3-word phrase; Al- 
ice compares the phrase displayed on A with the 
phrase spoken by B; Bob compares the phrase 
spoken by B with the phrase displayed on A (with 
Alice's help); based on the comparison, both Al- 
ice and Bob accept or reject the pairing on A and 
B, respectively. 

• Phrase-SS: A speaks out a 3-word phrase and 
then B speaks out a 3-word phrase; Alice com- 
pares the phrase spoken by A with the phrase spo- 
ken by B ; Bob compares the phrase spoken by B 
with the phrase spoken A; based on the compari- 
son, both Alice and Bob accept or reject the pair- 
ing on A and B, respectively. 

• Copy-Confirm: A displays a 5-digit number on 
its screen; Alice indicates the number to Bob and 
he inputs it on B; B indicates the result of pair- 
ing on its screen; Bob indicates the same result 
to Alice; Alice accepts or rejects pairing on A ac- 
cordingly. 

3.2 Test Devices to be Used 

In a single-user setting, one of the devices might be 
constrained in terms of input and output interfaces. 
For example, a headset, while being paired with a cell 
phone and an access point, while being paired with a 
laptop, are constrained devices (with no display, key- 
pad). On the other hand, both devices participating in 



a two-user communication setting are "personal" de- 
vices (such as PDAs, cell phones, laptops) and there- 
fore would not be constrained. These devices are gen- 
erally equipped with at least a full display and a key- 
pad. 

We wanted to simulate, as closely as possible, com- 
mon two-user pairing scenarios. To this end, for our 
entire study, we used two Nokia cellphones models Q 
N73 and E61, as test devices. Both models have been 
released two years ago (in 2006) and hence do not rep- 
resent the cutting edge technology. This was done on 
purpose in order to avoid devices with expensive fea- 
tures as well as processors faster than those commonly 
available at present. 

Another reason for choosing these devices is the 
plethora of common interfaces available on them. Re- 
call that our goal is to test many methods utilizing 
many different OOB channels, including: audio and 
visual and tactile. For each of these channels, some 
methods need user-input, user-output or both. The au- 
dio channel can require: speaker, beeper or a micro- 
phone. The visual channel requires an LED or a screen, 
whereas, the tactile channel can require: button or key- 
pad. Our test devices have all these features which 
allows testing all methods consistently. (Otherwise, 
changing devices across methods would seriously un- 
dermine the credibility of results.) Specifically, both 
N73 and E61 have the following features relevant to 
our tests: 

• Input Interface: keypad (subsumes button), mi- 
crophone 

• Output Interface: speaker (subsumes beeper), 
color screen (subsumes LED) 

• Wireless: Bluetooth 

In all tests, Bluetooth was used as the wireless (human- 
imperceptible) channel. We consider this choice to be 
natural since Bluetooth is widely available and inex- 
pensive. 

For methods that involve beeping, the general- 
purpose speaker is trivial to use as a beeper. When- 
ever a button is needed, one of the keypad keys is eas- 
ily configured for that purpose. A picture of a bright 
LED showed on the screen in place of a blinking LED. 



specs, see: 


www . nokiausa . com/A4409012 


leurope . nokia . com/A4 142101 
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Utilizing the whole screen rather than an LED was a 
natural and an obvious choice for two-user pairing. 

3.3 Implementation Details 

In comparative usability studies, meaningful and fair 
results can only be achieved if all methods are tested 
under similar conditions and settings. In our case, 
the fair comparison basis is formed by: (1) keeping 
the same test devices, (2) employing consistent GUI 
design practices (e.g., safe defaults), and (3) unify- 
ing targeted (theoretical) security level for all meth- 
ods. Our goal is to isolate - to the extent possible - 
user interaction in different methods as the only in- 
dependent variable throughout all tests. Minimizing 
any experimenter-introduced, systematic and cognitive 
bias is also important. In particular, we randomize test 
order, avoid close physical proximity and interaction 
between the participants and the experimenter, and au- 
tomate timing and logging to minimize errors and bi- 
ases. 

Some of the tested methods already had prior work- 
ing prototype implementations. However, these were 
mostly developed by their authors who aimed to 
demonstrate implementation feasibility. Consequently, 
such prototypes are often: incomplete, buggy and/or 
fragile as well as very dependent on specific hard- 
ware/software platforms. It is nearly impossible to pro- 
vide a uniform testing environment using available pro- 
totypes. Modifying them or implementing each from 
scratch is also not an option, due to the level of effort 
required. For stand-alone applications, implementing 
only the user interface is usually enough for the pur- 
poses of usability testing. However, distributed appli- 
cations, such as secure device pairing, need more than 
just user interface, since a realistic user experience is 
unattainable without any connection between devices. 

To achieve a unified software platform, our imple- 
mentation used the open-source comparative usability 
testing framework developed by Kostiainen, et al. 1 1 1 1. 
It provides basic communication primitives between 
devices as well as automated logging and timing func- 
tionality. However, we still had to implement separate 
user interfaces and simulated functionality for all tested 
methods. We used JAVA-MIDP to implement all meth- 
ods and created several test-cases for "no-attack" and 
"under-attack" scenarios. (The term attack is limited 
to MiTM/Evil-Twin attacks in this context). 



For all methods, we kept the SAS string length con- 
stant at 15 bits. We also tried to keep all user inter- 
faces similar, while applying same design practices, 
i.e, safe-default selection prompts, clear instructions, 
simple language, etc. All methods are precisely timed 
from the start to the end of user interaction. 

We believe that, in our implementation, user expe- 
rience and interaction model are very realistic. For 
most methods tested, the only difference between our 
variant and a real method is that we omitted the initial 
rounds of the underlying cryptographic protocol (e.g., 
SAS) that use the wireless channel, i.e., do not involve 
the user. Instead, our implementation supplies devices 
with synthetic SAS strings to easily simulate normal 
and MiTM attack scenarios. However, since messages 
over the wireless channel are completely transparent 
to the user, our simulation, from the user's perspective 
very closely resembles the real-life version. 



4 Usability Testing Details 

Having implemented all selected pairing methods on a 
common platform, we are ready to start the usability 
study for the two-user setting. Our goal is to evaluate 
and compare pairing methods we identified (and listed 



in Section 3.1 1 with respect to the following factors: 



1. Efficiency: time it takes to complete each method 

2. Robustness: how often each method leads to 
false positives (or rejection of a successful pair- 
ing instance) and false negatives (or acceptance 
of an unsuccessful pairing instance, i.e., pairing 
instance that is not between the intended devices). 
Following the terminology introduced in E71 . we 
will refer to the errors in the former category as 
safe errors and the latter as fatal errors. 

3. Usability: how each method fares in terms of user 
burden (i.e., ease-of-use perception) and personal 
preference. 

4. User Interactions: how the two users interact in 
order to perform the various steps involved in 
each method, and in particular, would they hand- 
in their devices to each other. 
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4.1 Study Participants 

For our tests, we recruited 40 participant^] and clearly 
at a given point of time two participants had to be 
present to complete the tests. The study lasted over 
a period of more than two months. Each pair of par- 
ticipants was chosen very carefully as we required 
them to have varying trust relationships with each 
other, ranging from being strangers, to acquaintances 
to close buddies. Each pair of users was briefed on 
the estimated amount of time required to complete the 
tests. The participants chosen were mostly univer- 
sity students both at the undergraduate and graduate 
level. Thus our study represents only the first step to- 
wards identifying methods suitable for the broad cross- 
section of user population. 



Age 
Gender 

Avg. Computer Experience 
Avg. Computer Usage Per Day 



18-24 95% 
25-29 5% 
Female 35% 
Male 65% 
8.5 years (2.14 StDev) 
7.75 hours (3.82 StDev) 



Figure 1 : User Demographics 



We prepared two questionnaires: background - to 
obtain user demographics and post-test - for user feed- 
back on methods tested. The user demographics are as 
shown in Figure [T] 

None of the study participants reported any physical 
impairments that could interfere with their ability to 
complete given tasks. The gender split was: 65% male 
and 35% female as shown in the figure. Gender and 
other information was collected through background 
questionnaires completed prior to testing. Also col- 
lected prior to testing was the information on whether 
the participants knew each other and if yes, how well 
did he/she know the other person. 

Trust relations between participants: Among the 20 
subject pairs, 5 of them have not met before (and thus 

5 This was one of the challenges in pursuing a two-user study 
- twice as many participants were needed compared to the single- 
user testing. It is well-known that a usability study performed by 20 
set of participants captures over 98% of usability related problems 

(U 



were complete strangers), 5 of them were close friends, 
and the remaining 10 were friends or colleagues but 
they did not consider each other as very close friends. 
In order to gain some insight into the trust relations and 
the acceptable interaction between the subject pairs, we 
asked them whether they would consider temporarily 
handing their device to the other person in order to ini- 
tiate a secure connection that they can later use to ex- 
change files, messages or play games. We also asked 
their reasoning and concerns related to their answers. 

Not surprisingly, all the 5 pairs that have not met be- 
fore said they wouldn't consider any physical exchange 
of the devices as an acceptable interaction. The two 
main concerns we identified in those pairs was the se- 
curity of the device and the data it stores as well the 
unpleasant social situation it may create. On the other 
hand, 4 out of 5 pairs that are close friends and have 
known each other well did not state any privacy con- 
cerns and said they would physically exchange their 
devices during the pairing if needed. Among the 10 
pairs that are friends or colleagues, 6 of them expressed 
serious concerns about any physical device exchange 
and considered it unacceptable and their reasonings 
were similar to the first group. 

From the observed trust relations and concerns ex- 
pressed by our subjects, we can conclude that any 
method that needs physical exchange of devices is un- 
acceptable in many scenarios where the owners of the 
devices do not know each other very well. Moreover, 
it may still be problematic in some situations even if 
the device owners knew each other. Among the 8 pairs 
who were not reluctant to exchanging devices, the re- 
lationship between the users played a strong role. Sur- 
prisingly, 5 among the 8 pairs only considered friends 
as the acceptable social group to temporarily exchange 
devices and even excluded family members. The re- 
maining 3 considered both family and the friends as ac- 
ceptable. However, we believe that the observed strong 
tendency to share devices with friends rather than with 
the family members was perhaps due to the biased sam- 
pling of our subjects, i.e., mostly unmarried college 
students. 

4.2 Test Cases 

There were a total of 9 tests, which each of our 20 
subject pairs performed. The test cases were designed 
in such a manner that the attacks will show up proba- 
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bilistically, meaning that the user will not see both no- 
attack and under-attack scenario for each method but 
they will see either of them with half a chance. This 
way we prevent the users from expecting one no-attack 
and one under-attack test case for each method. This 
also reduces the number of tests to be performed by 
the users. 

4.3 Testing Process 

Our study was conducted in a variety of campus venues 
including, but not limited to: student laboratories, 
cafes, student dorms/apartments, classrooms, office 
spaces and outdoor terraces. This was possible since 
the test devices were mobile, test set-up was more-or- 
less automated and only a minimal involvement from 
the test administrator was required. 

After giving a brief overview of our study goals, we 
asked the participants to fill out the background ques- 
tionnaire in order to collect demographic information. 
Both the participants in each test jointly filled up the 
questionnaire. In this questionnaire, we also asked the 
participants whether they suffer(ed) from any visual or 
hearing impairments, or have any condition that may 
interfere with their operating of devices or their re- 
flexes. After a short interview about the relationship 
between each pair of subjects, they were given a brief 
introduction to the cell-phone devices used in the tests. 

Each pair of users was given two devices (one per 
user) and asked to follow the on-screen instructions 
shown before each task to complete it. The two users 
were to mutually perform the tests. A close watch was 
made to observe whether the two users exchanged their 
respective devices to carry out the tests. 

User interactions throughout the tests were observed 
by the test administrator and timings were logged auto- 
matically by the testing framework. After completing 
the tasks, each pair jointly filled out the post-test ques- 
tionnaire form, in which they provided their feedback 
on the various methods tested and also if they found 
any particular test to be difficult. The pairs were also 
given a few minutes of free discussion time, where they 
discussed with the test administrator their experience 
with the various methods tested. 

4.4 Test Results 

We collected data in two ways: (1) by timing, observ- 
ing and logging user interaction, and (2) via question- 





Time 


Fatal Error 


Safe Error 


Avg # of trials 


Method 


(seconds) 


rate 


rate 


until success 


BEDA-Beep 


40.43 


0.00 


0.14 


1.14 


BEDA-Blink 


96.00 


0.00 


0.10 


2.20 


Beep-Blink 


45.11 


0.09 


0.11 


1.20 


Over-Audio 


18.75 


0.00 


0.00 


1.13 


Numeric-Compare 


12.50 


0.00 


0.10 


N/A 


Phrase-DD 


11.44 


0.00 


0.00 


N/A 


Phrase-DS 


21.45 


0.00 


0.00 


N/A 


Phrase-SS 


38.71 


0.00 


0.00 


N/A 


Copy-Confirm 


17.00 


0.17 


0.00 


N/A 



Figure 2: Summary of Logged Data 

naires and free-form interviews. 

For each method, completion times, errors, actions 
and the playcount, i.e., the number of trials it took be- 
fore successful pairing was established, were automat- 
ically logged by the software. The logged data is sum- 
marized in Figure [2] The average timing information 
(with standard deviations) is graphed in Figure [3] Also 
graphed, in Figure|4j is the playcount for the applicable 
methods. 

We also observed the user interactions while each 
pair of users was performing the various steps involved 
in each tested method. In general, we observed that the 
subjects often decided the outcome of pairing based on 
mutual agreement, which, we believe, may have helped 
to reduce errors in most of our comparison-based meth- 
ods. We did not observe any statistical correlation be- 
tween the closeness of the relationship between the 
participants and their interactions during the pairing 
process. Observed interactions for each method are 
summarized below. (Assume Alice is pairing her de- 
vice A with Bob's device B) 
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Figure 3: Time-to-Completion for Successful Pairing 
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Interval Plot of Number of Trials vs Method 

95% CI for the Mean 
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Figure 4: Interval Plot of Playcount 

• BEDA-Beep: The user responsible for press- 
ing the button (Bob) would listen carefully for 
the beeping on the other device (A) and syn- 
chronously press any button on B. The user of 
the beeping device (Alice) moved closer to Bob 
within a distance of about 1-2 feet so that the 
beeping sound could be heard clearly by him. Al- 
ice was also noticing if Bob was synchronizing 
the beeping with the button press. Once finished 
with this phase, Bob verbally notified Alice to 
accept or reject the pairing, based on the result 
shown on B. 

• BEDA-Blink: The user of the blinking device 
(Alice) would show her device to the other user 
(Bob) and Bob would press the button in synchro- 
nization with the blinking. The users were again 
within a distance of 1-2 feet. Once finished with 
this phase, Bob verbally notified Alice to accept 
or reject the pairing, based on the result shown on 
B. 

• Beep-Blink: After starting the pairing pro- 
cess, both Alice and Bob compared the blink- 
ing/beeping on their own device with the beep- 
ing/blinking on the other device. This required 
the two users to be within touching distance of 
each other so that both could watch as well as lis- 
ten to the flashing screen and beeping on the two 
devices, respectively. At the end, both users will 
accept or reject the pairing on their respective de- 
vices, based on their mutual judgement of whether 
blinking/beeping was synchronized or not. 



• Over- Audio: In this method, the role of the two 
users was quite "passive." Alice's device would 
start to play an audio encoding a bitstring and 
Bob's device would automatically record it. After 
the audio transfer was over, Bob would (verbally) 
tell Alice to accept or reject depending on what 
his screen indicated. 

• Numeric-Compare: In this method, both Alice 
and Bob either spell out or show the number dis- 
played on the screens of their devices, compare 
the two and mutually accept or reject the pairing 
on their respective devices. 

• Phrase-DD: Similar to Numeric-Compare, both 
Alice and Bob either spell out or show the sen- 
tence displayed on the screens of their devices, 
compare the two and accept or reject the pair- 
ing on their respective devices, based on mutual 
agreement. 

• Phrase-DS: This method involves the user (Alice) 
to listen carefully to the sentence spelled out by 
the device of other user (Bob) and then compare 
it with the sentence displayed on the screen of her 
device, and vice versa. For this to take place, Bob 
would take his device closer (about 1-2 feet) to 
Alice's device, in order for her to be able to lis- 
ten clearly to the spoken sentence. Alice and Bob 
mutually accept or reject the pairing on their re- 
spective devices, following a short discussion. 

• Phrase-SS: This method involves both the devices 
to speak out a sentence each. Thus, when the de- 
vices speak out, both the users would lean towards 
the devices to listen to them carefully. After lis- 
tening to both the sentences, they would analyze 
if the sentences were the same. They would then 
accept or reject the pairing on their respective de- 
vices, after verbally confirming with each other. 

• Copy-Confirm: Alice would either read out the 
number displayed on A or directly show the 
screen displaying the number to Bob. After feed- 
ing in the number on his device, Bob would ver- 
bally notify Alice to accept or reject, depending 
on what his screen directs him to do. 

Finally, through the post-test questionnaire, we so- 
licited user opinions about all tested methods. Each 
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pair of users rated each method on a 6 level Likert 
scale[15] over various statements such as the method 
is easy to use, professional, fun to use, tiring, takes too 
long to complete, and error prone. The ease-of-use and 
other attribute ratings are graphed in Figure [5] 

5 Interpreting Results 

In this section we attempt to interpret the results of our 
study. We first consider various mechanical data, i.e., 
time to completion and error rates. We then analyze 
the perceived qualitative aspects of the methods based 
on collected user ratings. 

5.1 Interpreting Time and Error Results 

Our results reflected in Figure [2] and Figure [3j prompt 
a number of observations. 

Completion time: The logical way to interpret the 
completion time is by looking at it under normal cir- 
cumstances, i.e., when no active or passive attacks oc- 
cur, as this is what users will normally experience. 
Thus, we only considered the no-attack test cases while 
calculating the average completion time. Based on 
this performance metrics, tested methods fall into three 
speed categories: fast (less than 20 sees), moderate (be- 
tween 20 and 30 sees) and slow (more than 30 sees). 
The fastest method is Phrase-DD at 11.44 sees for a 
successful outcome and it is closely followed by the 
Numeric-Compare at 12.5 sees. Copy-Confirm and 
Over- Audio methods are next taking 17 and 18.75 sees, 
respectively. Phrase-DS comes in at 21.45 and its per- 
formance is considered moderate and acceptable. The 
slow category includes the rest, ranging from Phrase- 
SS (38.71 sees) to BEDA-Blink which takes a whoop- 
ing 96 sees. 

Error Rates: As explained in section [4j fatal errors 
have a grave effect on security as they would result in a 
MiTM attack to be successful. On the other hand, safe 
errors do not constitute an immediate security threat 
but will create user annoyance as the pairing process 
has to be repeated, thus prolonging the completion 
time. However, in addition to poor usability experi- 
ence, safe errors may eventually threaten security as 
too much user annoyance may result in careless behav- 
ior that can yield to fatal errors. 



In our tests, most methods, except Copy-Confirm 
and Beep-Blink, fare well reporting no fatal errors. 
Copy-Confirm and Beep-Blink suffered from fatal er- 
ror rates of 17% and 9% respectively, which constitutes 
a significant vulnerability in the context of security 
applications. BEDA-Beep, BEDA-Blink, Beep-Blink 
and Numeric-Compare methods all yield more than 
10% error rates but for safe errors which are considered 
to be not directly threatening the security. However, as 
discussed earlier, safe errors are indication of poor us- 
ability and may eventually have adverse effects on se- 
curity as well as efficiency due to the user annoyance 
(e.g., after many trials with unsuccessful outcome, an 
annoyed user may just accept the connection without 
checking or despite the non-matching SAS values). 

Fortunately, it turns out that the fastest method is 
also one of the error-free methods. Taking both speed 
and error-rate into account, the overall best method 
is clearly Phrase-DD, followed by Over-Audio and 
Phrase-DS. Although Numeric-Compare is also quite 
fast, there is little motivation for using it over Phrase- 
DD. The reason is simple: both methods require the 
same hardware on devices (basic displays) and Phrase- 
DD provides lower error rates and takes about the same 
time as Numeric-Compare (this also confirms our intu- 
ition that users are better at interpreting phrases over 
numbers). Thus, for two-user pairing scenarios, where 
both devices are equipped with decent quality displays, 
Phrase-DD is a clear winner. 

Using similar reasoning, Over- Audio and Phrase-DS 
appear to be the best choices if audio channel can be 
utilized. However, Over- Audio needs a microphone on 
one device (devices such as laptops might not always 
be equipped with a microphone) and a speaker on the 
other. Phrase-DS can also be used in scenarios where 
one device has a speaker. Phrase-SS is also error-free, 
however, it relatively slow compared to Phrase-DS, and 
thus there is no good motivation for Phrase-SS to be 
used over Phrase-DD or Phrase-DS. 

Although the methods BEDA-Beep, BEDA-Blink 
and Beep-Blink have lower hardware requirements and 
work on devices that just have the most basic interfaces 
like a beeper, LED or a button, they take too long to 
complete. They usually require more than one trial to 
achieve a successful pairing (as we show in Figure [4]) 
and Beep-Blink also yields too high fatal error rate. 
Considering that the devices taking part in two-user 
scenarios have decent quality interfaces, BEDA-Beep, 
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Figure 5: User opinion 



BEDA-Blink and Beep-Blink can thus be safely ruled 
out, in favor of Phrase-DD or Phrase-DS. 

5.2 Interpreting User Ratings 

We now turn our attention to the graph in Figure |5j 
which summarizes the user opinions collected via the 
post-test questionnaire. Users are asked to rate six dif- 
ferent statements for each method on a 6 point Lik- 
ert scale lying between ratings "Strongly Disagree" 
and "Strongly Agree." The rated statements for each 
method were based on the criteria: {Easy, Profes- 
sional, Fun to Use, Tiring, Taking Too Long, Error 
Prone}. 

As can be seen from Figure [5} there is some observ- 
able, and statistically significant, correlation among the 
user ratings. Methods that are rated as easy are also 
rated as fun to use and professional. Moreover, meth- 
ods that are rated as "not easy," were perceived as er- 
ror prone, tiring and taking too long to complete. The 
cross correlation of user ratings are given in table [T] 

Not surprisingly, Numeric-Compare is ranked 
among the easiest methods concordant to its fast tim- 
ing and it being the most familiar method to users as 
it has already been deployed in many personal devices. 



As expected, Phrase-DD and Over- Audio also received 
very high ratings and are ranked among the easiest, 
most fun to use and professional methods. All three 
of Numeric-Compare, Phrase-DD and Over-Audio are 
among the user favorites. 

Contrary to their poor timing and/or high error rates, 
Beep-Blink and Copy-Confirm are ranked surprisingly 
positively by our users. Both methods were perceived 
as easy and error-resistant. Considering Copy-Confirm 
had a very high fatal error rate (as discussed previ- 
ously) and rated as one of the least error-prone meth- 
ods by the subjects, we can conclude that the partic- 
ipants that committed a fatal error were clearly not 
aware of it. Also judging from the high error rates of 
both Copy-Confirm and Beep-Blink, we can say that 
users' perception of security may be far from reality. 
This contradiction can also be easily observed in meth- 
ods Phrase-DS and Phrase-SS, although in the other 
direction. These two methods are ranked among the 
most error prone of all tested methods but they yield 
0% error-rate in our tests. 

BEDA-Beep, BEDA-Blink, Phrase-DS and Phrase- 
SS are the methods considered relatively hard, error 
prone, taking long time to complete, less professional 
and less fun to use. The relatively lower user ratings 
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-0.445 


0.817 
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-0.425 


0.722 


-0.358 


0.749 


-0.361 



p <C 0.01, for all correlations 



Table 1 : Cross-Correlation of Polled Measures 



for BEDA-Beep and BEDA-Blink agree with the long 
completion timings and high error rates we observed in 
our tests. However, the user perception was deceptive 
about Phrase-DS and Phrase-SS, especially in terms of 
how error prone they are. 

5.3 Final Inferences and Recommendations 

Some conclusions that can be drawn by combining the 
quantitative and qualitative data we collected are as fol- 
lows: 

• Comparison-based pairing methods utilizing vi- 
sual channel are preferred by our users. Among 
those, we recommend Phrase-DD over Numeric- 
Compare as the former yields lower (in fact, no) 
errors. Since displays are ubiquitous on per- 
sonal devices participating in two-user scenarios, 
Phrase-DD is a clear winner in terms of speed, ro- 
bustness and user preference, as well as universal 
deployability. 

• Among the methods utilizing audio channel, 
Over- Audio was the user favorite. Phrase-DS also 
performed well in our tests by yielding low com- 
pletion timing and no errors. Phrase-SS was also 
error-free, however, relatively slow. We believe 
that Phrase-DS is a better choice compared to 
Over-Audio. This is because Over-Audio needs 
a microphone on one device (devices such as lap- 
tops might not always be equipped with a micro- 
phone) and a speaker on the other. Another draw- 
back with Over-Audio is that it is hard for the 
users to accurately determine the actual source 
of automated audio, and thus an automated au- 
dio channel might be insecure in the presence of a 
close-by attacker with an interfering audio device 
[25]0 

6 To detect the presence of a malicious interfering audio device, 



• Beep-Blink and Copy-Confirm produced high er- 
ror rates in our tests and thus we do not recom- 
mend using them. 

• BEDA-Beep and BEDA-Blink demonstrated poor 
usability in our tests. They usually take more than 
one trial for successful completion and take too 
long to complete. User ratings for these methods 
are also relatively low and therefore we do not rec- 
ommend using them. 

• In general, we recommend the use of Phrase- 
DDwhen possible and it to be complemented with 
Phrase-DSmethods to be able to accommodate a 
wider variety of devices with different interface 
capabilities. 

6 Conclusions and Future Work 

We presented the first experimental evaluation of 
prominent device pairing methods that can be used in 
two-user scenarios. First and foremost, our survey re- 
sults confirmed our belief that a majority of users are 
considerate about the privacy of their personal devices 
and they would not be willing to hand-in their devices 
to other users, even temporarily to perform the pairing 
process. This means that a two-user pairing method 
can not be simply reduced to single-user pairing. 

The results of our usability study show that one 
simple method, Phrase-DD, is quite attractive over- 
all, being both fast and error-tolerant as well as user- 
friendly. It naturally appeals to two-user settings where 
devices have appropriate quality and size displays. 
Slightly slower method, Phrase-DS, can seamlessly 
inter-operate with Phrase-DD, for wider deployment 

it would be necessary for the users to perform the manual compar- 
ison of SAS strings on both devices, e.g., by using Phrase-DD or 
Phrase-DS 1 25 1 . This undermines the advantage of using an auto- 
mated audio channel at all. 
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and for scenarios where one device has a speaker. We 
also observed that, in general, the subjects often de- 
cided the outcome of pairing based on mutual agree- 
ment, which, we believe, may have helped to reduce 
errors in most of our comparison-based methods. 

We believe that our work is an important and timely 
first step in exploring real-world usability of secure 
device pairing methods for emerging two-user scenar- 
ios. Our work helps affirmatively answer the follow- 
ing important question: what pairing method(s) should 
one deploy in practice for two-user scenarios? With- 
out such a study, it would have been hard to answer 
this question, based on intuition or prior test results 
on single-user pairing scenarios. As our results show, 
what works well for ordinary users is often quite dif- 
ferent from what is imagined by researchers. More- 
over, usable security is a tricky subject where the user 
perception may be far from the reality, as our results 
indicate. 

In our future work, we plan on rigorously study- 
ing the promising methods resulting from our current 
study. Our plans for immediate future work include : 

• Experiments with more diverse pool of partici- 
pants, especially those who are not young and 
technology-savvy, like our current participants. 

• Experiments involving different (more diverse) 
devices (such as laptops, PDAs). 

• Experiments in more realistic usage scenarios, 
outside of a lab-controlled environment. 
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